Institutional knowledge is everything that an employee has learned in the company and should retain after leaving
Digitisation in chemical research - from data to knowledge.
As the pharmaceutical and fine chemical industries continue to move towards greater innovation, while facing constant pressure to improve R&D productivity, companies have invested in a number of software layer solutions to improve the sharing of information and best practices between scientists at all levels of their organisations, as well as to build institutional knowledge. At the same time, companies have also invested in a number of technologies at the basic level to increase understanding of the chemical processes they develop. However, the value of these basic and enterprise-level systems is only truly effective if data are properly extracted from the basic level and aggregated to the enterprise level in a complete and efficient way.
The reality is that data collection is currently far from straightforward and is usually done in a very haphazard manner. It involves writing observations on paper or using a copy-and-paste procedure. At best, this type of data collection can lead to all kinds of errors and in many cases the data are never available in the company's system.
85% of data in laboratory processes is lost due to automation failures (*estimate based on lack of data transfer protocols, human error in transfer and data not collected during the experiment)
Addressing these gaps is essential to build institutional knowledge and to get a full return on the capital invested in these systems. Data must be collected in the laboratory and transferred to the right place. So far, this has been done with simple analytical data, which is essentially a series of individual results for individual samples. When dealing with more complex process data, current systems are not designed to make this data collection simple and to provide the information and analysis required.
The changing face of research
There have been a number of common issues that have emerged in almost all R&D organisations in the pharmaceutical and fine chemicals sectors over the last five years. To be globally competitive, organisations need to innovate more and faster than ever before, and do so with fewer resources. In other words, they need to increase productivity to compete with other organisations. As a result of these industry drivers, R&D practices are changing and continue to evolve rapidly. Technology allows researchers to learn more about reactions and processes, and several technologies have improved the storage of experimental results. However, much of the data never reaches these repositories, and instead is stored in unstable environments such as USB drives or stand-alone computers that are not easily accessible or regularly backed up, or on intranet platforms where it cannot be easily accessed for further use and processing. In fact, it is estimated that 85% of the data collected during process research or synthesis is lost because it is not in the right data repository, leaving organisations at very high risk of data loss, which significantly reduces the creation and quality of the institutional knowledge base.
Why collect data?
"I think people underestimate how little truly clean data there is and how difficult it is to cleanse and link data." - Forbes, Interview with Novartis CEO Vas Narasimhan, 16 January 2019
There are many reasons for collecting data, but in short, data provides an opportunity to answer questions to support business needs. An aircraft goes through an extensive testing and certification process before it can be used commercially, and the chemical processes are in many ways exactly the same. If an aircraft does not fly as expected, the data can help engineers understand why it did not. Similarly, if a chemical reaction or process does not go as expected, data can help scientists understand what actually happened.
Data help to build an understanding of how a reaction occurs or how a process is affected by a change in parameters. Data can also help further improve products and processes, and sharing this knowledge with colleagues is an important step in building institutional knowledge.
Perspective is the key word here. Data collected during process development tends to be analysed on the basis of perspective at the time of the experiment. The data tend to be used to answer the question at hand and are stored or recorded accordingly. Importantly, the scientist may not know what the right question was at the time of the investigation. This may only become apparent months or years later, usually when only a few notes in a notebook remain from the original experiment and the data files themselves have since been deleted or stored somewhere that the user can no longer remember or find. Of course, hindsight is a wonderful thing, so having the data available to look back on and ask another question can be very useful and possibly prevent the need to repeat experiments.
Challenges for industry and academic institutions
Institutional knowledge building is becoming a key topic in many sectors. For several years, companies have invested heavily in ELNs and data warehousing systems where they develop their knowledge as an organisation. However, institutional knowledge can only be acquired if data is collected and stored in a way that makes it searchable at a later date.
This is one of the main problems to be solved. The ELN can be used as an enterprise portal where valuable information is stored and accessible. But the value is only really usable if all the data (as well as the conclusions drawn from the data) are entered into the system. The reality is that this is a difficult step for researchers and chemists today, where a significant amount of data does not make it to the storage system. This was actually highlighted in a study showing how chemists often store laboratory data in outdated or unstable formats or in formats that are difficult for others to access and interpret. It also identified the need for better support in data management and data presentation.
Linked to the conclusions drawn from the data is the reporting of the experiment. The more complex the experiment, the more report writing becomes a bottleneck. Data has to be transferred, graphs plotted and analysed, often from multiple data streams and sources. Basic tasks such as timestamping can be a cumbersome process, but it is essential to interpret the data correctly. All of this requires time that the scientist can spend drawing conclusions or working on the next experiment. So another key problem to be solved is reducing the time taken to combine all the relevant data streams.
Industry vision
Trying to get process recipes and knowledge collected in a central place has been a theme of the industry for the last 10 years or so. LIMS systems have long been used to store analytical data, but the collation of process data is a recent trend. Why is this important? Some companies see it as an opportunity to create a prescription-based data warehouse where a system-independent prescription model can be created using industry standards such as S-88 and S-953,4. Another important trend is the use of ML and AI for scientific data to ease the tedious data mining work that many (data) scientists have to do and to speed up the generation of new insights.
Diagram
Description automatically generated
However, once again, this can only be done if the data is collected consistently throughout the development process and is placed in the right data storage module and correctly. As the tools used during the studies have become more sophisticated, the volume and complexity of the data collected has increased, making it more difficult to ensure that this data is correctly collected and stored. The more this process can be automated, the more reliable it becomes.
Data quality checklist:
- Encompassing - Do we have all the data points, regardless of whether we think they apply to this one experiment?
- Accurate and reliable - Are we confident that the data are truly representative?
- Structured and relevant to context - Is the information usable and will it make sense later?
- Standardised and consistent - Does everyone work in the same way?
- Retention - Is there a systematic way to flag 'bad' data?
Data governance gap
A brief analysis of the average chemistry workflow in a laboratory reveals the extent of the problem. Typically, the chemist will identify the chemistry to be used and prepare the recipe in ELN or even write the reaction scheme on paper. Transferring this recipe to the system equipment is then a manual process. Once the data has been collected during the experiment, again a manual process is required to transfer this experimental data from the system equipment to the data storage system. It is easy to understand why a large amount of data is not captured and is lost. In addition, although more and more data is being collected in the system equipment in an automated way, the processing of this data is still done manually, which means that the analysis of the experimental results can take many hours. Often, analysing experiments requires time to merge datasets into packages such as Microsoft® Excel® or time to transfer data to different locations.
Finally, although there are well-structured places to store data, there is a desirable way to share data within a project. There is a gap that can be filled by automating the collection of data from the system equipment and efficiently transferring it to knowledge management systems such as ELN and Data Warehouse.
Addressing the challenge
Another article discussed how synthesis workstations are becoming common in today's laboratories to meet the need for increased productivity. At the same time, analytical technologies are becoming more widely used, driven by the need to gather information and better understand a chemical reaction or process. Together, the use of these tools is changing the way things are done. New technology is now on the market that eliminates the problem of data loss. Supporting the daily work of scientists in the laboratory, the iC Data Centre allows all experimental data to be extracted, and analytical and process data to be merged to create institutional knowledge.
iC Data Center: aggregate, compare, collaborate
The iC Data Center greatly facilitates the daily work of researchers and their colleagues by ensuring that all experimental data is automatically extracted from instruments, prepared in useful formats (e.g. iC experiment files, Microsoft® Word® and Microsoft® Excel® ) and shared in a central file repository. The files are automatically grouped into a structure defined by the client, after which an email notification is sent to the scientist who started the experiment, containing a link to the location of the files. From there, the data can be analysed and the files can be shared with colleagues or company systems such as ELN or Data Warehouse. The solution is designed to prevent data loss, save significant time compared to manual data transfer, enable collaboration between colleagues and ensure that full value is gained from the company's investment in standard automation and analytical tools as well as enterprise-grade software solutions.
Thanks to software solutions such as the iC Data Centre and their combination with automated laboratory reactors, many chemical synthesis laboratories have experienced significant efficiency gains even during the COVID era (when fewer staff were allowed to work in the laboratory). They laid the foundations for their own digitisation beginnings and continue to build on them.
Digitisation in chemical research - from data to knowledge.
As the pharmaceutical and fine chemical industries continue to move towards greater innovation, while facing constant pressure to improve R&D productivity, companies have invested in a number of software layer solutions to improve the sharing of information and best practices between scientists at all levels of their organisations, as well as to build institutional knowledge. At the same time, companies have also invested in a number of technologies at the basic level to increase understanding of the chemical processes they develop. However, the value of these basic and enterprise-level systems is only truly effective if data are properly extracted from the basic level and aggregated to the enterprise level in a complete and efficient way.
The reality is that data collection is currently far from straightforward and is usually done in a very haphazard manner. It involves writing observations on paper or using a copy-and-paste procedure. At best, this type of data collection can lead to all kinds of errors and in many cases the data are never available in the company's system.
85% of data in laboratory processes is lost due to automation failures (*estimate based on lack of data transfer protocols, human error in transfer and data not collected during the experiment)
Addressing these gaps is essential to build institutional knowledge and to get a full return on the capital invested in these systems. Data must be collected in the laboratory and transferred to the right place. So far, this has been done with simple analytical data, which is essentially a series of individual results for individual samples. When dealing with more complex process data, current systems are not designed to make this data collection simple and to provide the information and analysis required.
The changing face of research
There have been a number of common issues that have emerged in almost all R&D organisations in the pharmaceutical and fine chemicals sectors over the last five years. To be globally competitive, organisations need to innovate more and faster than ever before, and do so with fewer resources. In other words, they need to increase productivity to compete with other organisations. As a result of these industry drivers, R&D practices are changing and continue to evolve rapidly. Technology allows researchers to learn more about reactions and processes, and several technologies have improved the storage of experimental results. However, much of the data never reaches these repositories, and instead is stored in unstable environments such as USB drives or stand-alone computers that are not easily accessible or regularly backed up, or on intranet platforms where it cannot be easily accessed for further use and processing. In fact, it is estimated that 85% of the data collected during process research or synthesis is lost because it is not in the right data repository, leaving organisations at very high risk of data loss, which significantly reduces the creation and quality of the institutional knowledge base.
Why collect data?
"I think people underestimate how little truly clean data there is and how difficult it is to cleanse and link data." - Forbes, Interview with Novartis CEO Vas Narasimhan, 16 January 2019
There are many reasons for collecting data, but in short, data provides an opportunity to answer questions to support business needs. An aircraft goes through an extensive testing and certification process before it can be used commercially, and the chemical processes are in many ways exactly the same. If an aircraft does not fly as expected, the data can help engineers understand why it did not. Similarly, if a chemical reaction or process does not go as expected, data can help scientists understand what actually happened.
Data help to build an understanding of how a reaction occurs or how a process is affected by a change in parameters. Data can also help further improve products and processes, and sharing this knowledge with colleagues is an important step in building institutional knowledge.
Perspective is the key word here. Data collected during process development tends to be analysed on the basis of perspective at the time of the experiment. The data tend to be used to answer the question at hand and are stored or recorded accordingly. Importantly, the scientist may not know what the right question was at the time of the investigation. This may only become apparent months or years later, usually when only a few notes in a notebook remain from the original experiment and the data files themselves have since been deleted or stored somewhere that the user can no longer remember or find. Of course, hindsight is a wonderful thing, so having the data available to look back on and ask another question can be very useful and possibly prevent the need to repeat experiments.
Challenges for industry and academic institutions
Institutional knowledge building is becoming a key topic in many sectors. For several years, companies have invested heavily in ELNs and data warehousing systems where they develop their knowledge as an organisation. However, institutional knowledge can only be acquired if data is collected and stored in a way that makes it searchable at a later date.
This is one of the main problems to be solved. The ELN can be used as an enterprise portal where valuable information is stored and accessible. But the value is only really usable if all the data (as well as the conclusions drawn from the data) are entered into the system. The reality is that this is a difficult step for researchers and chemists today, where a significant amount of data does not make it to the storage system. This was actually highlighted in a study showing how chemists often store laboratory data in outdated or unstable formats or in formats that are difficult for others to access and interpret. It also identified the need for better support in data management and data presentation.
Linked to the conclusions drawn from the data is the reporting of the experiment. The more complex the experiment, the more report writing becomes a bottleneck. Data has to be transferred, graphs plotted and analysed, often from multiple data streams and sources. Basic tasks such as timestamping can be a cumbersome process, but it is essential to interpret the data correctly. All of this requires time that the scientist can spend drawing conclusions or working on the next experiment. So another key problem to be solved is reducing the time taken to combine all the relevant data streams.
Industry vision
Trying to get process recipes and knowledge collected in a central place has been a theme of the industry for the last 10 years or so. LIMS systems have long been used to store analytical data, but the collation of process data is a recent trend. Why is this important? Some companies see it as an opportunity to create a prescription-based data warehouse where a system-independent prescription model can be created using industry standards such as S-88 and S-953,4. Another important trend is the use of ML and AI for scientific data to ease the tedious data mining work that many (data) scientists have to do and to speed up the generation of new insights.
Diagram
Description automatically generated
However, once again, this can only be done if the data is collected consistently throughout the development process and is placed in the right data storage module and correctly. As the tools used during the studies have become more sophisticated, the volume and complexity of the data collected has increased, making it more difficult to ensure that this data is correctly collected and stored. The more this process can be automated, the more reliable it becomes.
Data quality checklist:
- Encompassing - Do we have all the data points, regardless of whether we think they apply to this one experiment?
- Accurate and reliable - Are we confident that the data are truly representative?
- Structured and relevant to context - Is the information usable and will it make sense later?
- Standardised and consistent - Does everyone work in the same way?
- Retention - Is there a systematic way to flag 'bad' data?
Data governance gap
A brief analysis of the average chemistry workflow in a laboratory reveals the extent of the problem. Typically, the chemist will identify the chemistry to be used and prepare the recipe in ELN or even write the reaction scheme on paper. Transferring this recipe to the system equipment is then a manual process. Once the data has been collected during the experiment, again a manual process is required to transfer this experimental data from the system equipment to the data storage system. It is easy to understand why a large amount of data is not captured and is lost. In addition, although more and more data is being collected in the system equipment in an automated way, the processing of this data is still done manually, which means that the analysis of the experimental results can take many hours. Often, analysing experiments requires time to merge datasets into packages such as Microsoft® Excel® or time to transfer data to different locations.
Finally, although there are well-structured places to store data, there is a desirable way to share data within a project. There is a gap that can be filled by automating the collection of data from the system equipment and efficiently transferring it to knowledge management systems such as ELN and Data Warehouse.
Addressing the challenge
Another article discussed how synthesis workstations are becoming common in today's laboratories to meet the need for increased productivity. At the same time, analytical technologies are becoming more widely used, driven by the need to gather information and better understand a chemical reaction or process. Together, the use of these tools is changing the way things are done. New technology is now on the market that eliminates the problem of data loss. Supporting the daily work of scientists in the laboratory, the iC Data Centre allows all experimental data to be extracted, and analytical and process data to be merged to create institutional knowledge.
iC Data Center: aggregate, compare, collaborate
The iC Data Center greatly facilitates the daily work of researchers and their colleagues by ensuring that all experimental data is automatically extracted from instruments, prepared in useful formats (e.g. iC experiment files, Microsoft® Word® and Microsoft® Excel® ) and shared in a central file repository. The files are automatically grouped into a structure defined by the client, after which an email notification is sent to the scientist who started the experiment, containing a link to the location of the files. From there, the data can be analysed and the files can be shared with colleagues or company systems such as ELN or Data Warehouse. The solution is designed to prevent data loss, save significant time compared to manual data transfer, enable collaboration between colleagues and ensure that full value is gained from the company's investment in standard automation and analytical tools as well as enterprise-grade software solutions.
Thanks to software solutions such as the iC Data Centre and their combination with automated laboratory reactors, many chemical synthesis laboratories have experienced significant efficiency gains even during the COVID era (when fewer staff were allowed to work in the laboratory). They laid the foundations for their own digitisation beginnings and continue to build on them.